pic
Personal
Website

Performance Improvements in Julia (Part 2): Tuples and Static Arrays

Martin Alfaro - PhD in Economics
June 19, 2023
Remark
The script in this post is available here under the name allCode.jl. It's been tested under Julia 1.9.

For more accurate benchmarks, each variable is wrapped in ref(x) = (Ref(x))[] and interpolated through $.

Introduction

We continue analyzing approaches to enhancing performance in Julia. The focus is in particular on methods that are relatively easy to implement, in the sense of not requiring modifying the code substantially.

In the previous post, we explored lazy broadcast. This technique is useful when multiple intermediate steps are needed to obtain a final result. In this post, we'll delve into another approach to improving performance: tuples and static vectors. They are helpful when we work with a collection of objects that have a fixed size and contain a small number of elements. While the exact definition of small is unspecified, a rough guideline is to consider collections with less than 100 elements. In fact, I recommend using tuples and static vectors only when their benefits are evident, which typically means collection with about 20 elements.

We'll illustrate their application through tuples, as static vectors are tuples under the hood. It's worth remarking that static arrays are more versatile than tuples, as they can accommodate matrices and mutable objects. Nevertheless, we'll treat them as the same, since these particular uses won't be part of the post.

The post's main conclusion is that tuples/static arrays can be quite powerful. But you need to be particularly careful, as their misuse can introduce type instabilities and hurt performance severely.

Motivating Example

Let's concentrate on a concrete situation to facilitate our intuition. Suppose that a university is assessing potential students for their admission. As the university has too many candidates, they decide to perform a screening procedure. This procedure involves the candidates taking a pre-test, and then the university comparing their scores with the admission thresholds of previous years. Only those candidates above the threshold of at least one previous year will be considered.

To formalize the situation, let's create some mock data. The details for constructing the dataset are irrelevant for our purposes, and hence we'll omit them. [note] See the code uploaded if you want to review the procedure. What matters is that we'll use two variables, named pre_scores and past_thresholds, which are created through the function generate_variables.

Code to Set the Scenario
Random.seed!(123)    #setting the seed for reproducibility 
pre_scores, past_thresholds = generate_variables(nr_years = 20, nr_people = 10_000)

isabove(score, thresholds)  = any(score .> thresholds)
areabove(score, thresholds) = isabove.(score, Ref(thresholds))

past_thresholds captures the thresholds of previous years, which is 20 years in our example. In turn, pre_scores represents the scores of the candidates, with 10,000 applicants assumed in our example. Likewise, the function isabove defines the university's procedure for a single student. It checks whether the candidate is above the threshold in at least one of the 20 previous years. [note] In case you're not familiar with the function any, it takes a vector as an argument and returns true if at least one of the vector's elements is true. The function areabove extends this by applying isabove to every candidate. The variables look as follows.

Output in REPL
julia> pre_scores
10000-element Vector{Float64}:
 1.8267202241683234
 3.161299600281294
 4.829613855482266
 â‹®
 2.2488030844480194
 2.162875169226787

julia> past_thresholds
20-element Vector{Float64}:
 5.678547557310528
 9.325358309487532
 5.1380234173752815
 â‹®
 5.682373858994395
 5.285879918833311

Comparing Vector vs Tuple

Our goal is to compare the performance of the function isabove based on how we handle past_thresholds. In the baseline scenario, we consider past_thresholds as an ordinary vector, and compare it against past_thresholds defined as a tuple/static vector. past_thresholds is well-suited for the use of tuples/static vectors, as it consists of 20 elements and its size will remain fixed throughtout the analysis.

Since static vectors are ultimately tuples, we exemplify the results by using the latter. The following code snippet reveals a significant increase in performance, even when we consider a relatively low number of candidates. Furthermore, it shows the simplicity of its implementation, requiring only the application of the function Tuple. [note] The function ref wrapping each variable is employed to prevent the compiler from optimizing the benchmark. Otherwise, the time reported would be artificially low, because the processor repeats the result rather than the operation. See here for instance.

@btime areabove(ref($pre_scores), ref($past_thresholds))
Output in REPL
  515.200 μs (20004 allocations: 943.06 KiB)
@btime areabove(ref($pre_scores), ref(Tuple($past_thresholds)))
Output in REPL
  8.733 μs (24 allocations: 6.03 KiB)

Use of Tuples

In the following, we'll demonstrate that tuples/static vectors need to be handled carefully, as their incorrect usage can easily introduce type instabilities. Keep in mind that running a type-unstable function in Julia not only reduces performance, but basically kills it.

The possibility of type instability arises because tuples/static vectors are a data type defined by both their elements' type (Float64 in our example) and the number of elements contained. This determines that a function is type stable if the compiler knows in particular the tuple's length at the execution time. However, this is not automatically achieved, because Julia chooses a function's method according to the types of its arguments, not its values.

To illustrate the problem, let's consider various approaches to converting past_thresholds into tuples. The first one is simply the approach we used above. The key of this method not creating any issue is that the tuple is passed into areabove as an argument of the function. However, it'd be natural to instead pass past_thresholds as is (i.e., as a vector), and then convert it into a tuple within the function. This is covered by the codes in Tuple 2 to Tuple 5, where the problem may arise.

tuple_thresholds = Tuple(past_thresholds)

@btime areabove(ref($pre_scores), ref($tuple_thresholds)) #type stable
Output in REPL
  8.167 μs (3 allocations: 5.55 KiB)
function areabove(scores, thresholds)
    tuple_thresholds = Tuple(thresholds)
    isabove.(scores, Ref(tuple_thresholds))
end

@btime areabove(ref($pre_scores), ref($past_thresholds)) #type unstable
Output in REPL
  23.100 μs (28 allocations: 6.30 KiB)
function areabove(scores, thresholds) 
    tuple_thresholds = NTuple{length(thresholds), eltype(thresholds)}(thresholds)
    isabove.(scores, Ref(tuple_thresholds))
end

@btime areabove(ref($pre_scores), ref($past_thresholds)) #type unstable
Output in REPL
  23.200 μs (11 allocations: 6.25 KiB)
function areabove(scores, thresholds) 
    tuple_thresholds = NTuple{20,eltype(thresholds)}(thresholds)
    isabove.(scores, Ref(tuple_thresholds))
end

@btime areabove(ref($pre_scores), ref($past_thresholds)) #type stable
Output in REPL
  8.167 μs (3 allocations: 5.55 KiB)
function areabove(scores, thresholds, ::Val{N}) where {N}
    tuple_thresholds = NTuple{N,eltype(thresholds)}(thresholds)
    isabove.(scores, Ref(tuple_thresholds))
end

@btime areabove(ref($pre_scores), ref($past_thresholds), Val(length(ref($past_thresholds)))) #type stable
Output in REPL
  8.333 μs (3 allocations: 5.55 KiB)

Tuple 2 creates a type instability because the compiler analyzes the function, but doesn't figure out in advance the number of elements in tuple_thresholds. Therefore, it treats the tuple as potentially comprising any number of elements. Tuple 3 reveals that the issue is not solved by explicitly indicating the length and type of its elements—Julia doesn't dispatch the method by values, but by types. This means that Julia doesn't choose the function's method by computing length(thresholds) beforehand.

Tuple 4 shows an easy way to restore type stability, which consists in directly indicating the number of elements when creating the tuple. This also demonstrates that the problem is not due to the type of the tuple's elements: we can still use eltype(thresholds), as the compiler knows the type of thresholds in the function areabove. However, the solution is not practical, because it requires indicating the tuple's length in advance.

This leads us to Tuple 5, which is the solution generally used in practice. It uses Val as an artifact for dispatching the method by values. Expressed in words, this entails using Val{N} as a function's argument, where N is the tuple's length. In this way, the compiler knows the number of elements in the tuple, and can dispatch the method accordingly. Notice that Val is called in the function by using curly brackets (i.e., Val(length(past_thresholds))).

When the Type Instabilities Created Constitute a Serious Concern?

If you observe the running times of Tuple 2 and 3 above, you'll notice that the type-instability created isn't a significant problem. However, this only happens because we're immediately passing the tuple to the function isabove, and type instabilities do not arise if we use a tuple/static vector as a function's argument.

This is part of a more general technique called barrier functions. [note] See here for the official documentation's explanation. It entails the presence of an "inner" function solving the type instability of the "parent" function.

The use of barrier functions allows the user to maintain code readibility by avoiding the use of Val. However, it's crucial to keep in mind that barrier functions may not always be a suitable solution, especially if the type instability remains unsolved before a tight loop. This scenario is demonstrated in the following code, where we simplify the illustration by considering an abstract example.

Random.seed!(123)    #setting the seed for reproducibility 
x = rand(10)

function example1(x) 
    tuple_x = Tuple(x)

    for _ in 1:100_000
        log.(tuple_x) 
    end
end

@btime example1(ref($x))    # type-unstable
Output in REPL
  13.301 ms (300011 allocations: 27.47 MiB)
Random.seed!(123)    #setting the seed for reproducibility 
x = rand(10)

function example2(x, ::Val{N}) where {N}
    tuple_x = NTuple{N,eltype(x)}(x)   

    for _ in 1:100_000
        log.(tuple_x) 
    end
end

@btime example2(ref($x), Val(length(ref($x))))  # type-stable
Output in REPL
  2.544 ms (0 allocations: 0 bytes)
Random.seed!(123)    #setting the seed for reproducibility 
x = rand(10)

function barrier_function(x) 
    for _ in 1:100_000
        log.(x) 
    end
end

function example3(x) 
    tuple_x = Tuple(x)
    barrier_function(tuple_x)
end
   
@btime example3(ref($x))  # type-unstable but minimum penalty
Output in REPL
  2.543 ms (11 allocations: 256 bytes)

Static Vectors

We conclude this post by demonstrating how to avoid type instabilities when using static vectors. Essentially, the underlying principles resemble those for tuples, but the syntax for creating static vectors differs.

function with_SVector_barrier(scores, thresholds)
    sthresholds = SVector(thresholds...)
    areabove(pre_scores, sthresholds)
end

@btime with_SVector_barrier(ref($pre_scores), ref($past_thresholds)) #type unstable
Output in REPL
  20.200 μs (26 allocations: 6.38 KiB)
function with_SVector_val(scores, thresholds, ::Val{N}) where {N}
    sthresholds = SVector{N, eltype(thresholds)}(thresholds)
    areabove(scores, sthresholds)
end

@btime with_SVector_val(ref($pre_scores), ref($past_thresholds), Val(length(ref($past_thresholds)))) #type stable
Output in REPL
  19.500 μs (3 allocations: 5.55 KiB)
function with_SVector_macro(scores, thresholds) 
    sthresholds = @SVector [thresholds[i] for i in eachindex(past_thresholds)]
    areabove(scores, sthresholds)
end

@btime with_SVector_macro(ref($pre_scores), ref($past_thresholds)) #type stable
Output in REPL
  19.500 μs (3 allocations: 5.55 KiB)

Notice in particular how a static vector is created in SVector 3. This involves specifying the vector's range by using either a global variable (as in our example with past_thresholds) or a literal value (e.g., 20)—the package StaticArrays dictates that the range must be specified in this manner.